Wishart distribution (wishart) — Random scatter / covariance matrices#

The Wishart distribution is a probability distribution over symmetric positive-definite (SPD) matrices. It is the multivariate generalization of the chi-square distribution, and it appears whenever you sum outer products of multivariate Gaussian vectors.

A canonical generative story (integer degrees of freedom):

  • Draw (x_1,\dots,x_\nu \stackrel{iid}{\sim} \mathcal{N}_p(0,\Sigma))

  • Form the scatter matrix (W = \sum_{i=1}^\nu x_i x_i^\top = X^\top X)

Then (W \sim \mathrm{Wishart}_p(\nu,\Sigma)).

Learning goals#

  • Classify the distribution and understand its support/parameter space.

  • Write the PDF and interpret the matrix-valued CDF.

  • Compute/derive mean and covariance of entries, the (matrix) MGF, and entropy.

  • Implement NumPy-only sampling (Bartlett decomposition).

  • Visualize scalar marginals and Monte Carlo behavior.

  • Use scipy.stats.wishart (and understand what it does not implement).

Prerequisites#

  • Linear algebra: eigenvalues, determinants, trace, Cholesky

  • Multivariate normal distribution

  • Basic matrix calculus intuition (for the likelihood section)

Notebook roadmap#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations

  7. Sampling & Simulation

  8. Visualization

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import stats
from scipy.stats import wishart
from scipy.special import multigammaln, psi

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)

SEED = 7
rng = np.random.default_rng(SEED)

# Quick/slow toggle (mirrors patterns used elsewhere in this repo)
FAST_RUN = True
N_SAMPLES = 25_000 if FAST_RUN else 250_000
import sys
import plotly

print("Python:", sys.version.split()[0])
print("NumPy:", np.__version__)
print("SciPy:", scipy.__version__)
print("Plotly:", plotly.__version__)
print("Seed:", SEED)
print("FAST_RUN:", FAST_RUN, "N_SAMPLES:", N_SAMPLES)
Python: 3.12.9
NumPy: 1.26.2
SciPy: 1.15.0
Plotly: 6.5.2
Seed: 7
FAST_RUN: True N_SAMPLES: 25000

1) Title & Classification#

  • Name: wishart (Wishart distribution)

  • Type: Continuous (matrix-valued distribution)

  • Support: the cone of (p\times p) symmetric positive-definite matrices

[ \mathcal{S}_p^{++} = \left{W \in \mathbb{R}^{p\times p} : W=W^\top,; x^\top W x > 0;;\forall x\neq 0\right}. ]

  • Parameter space:

    • degrees of freedom: (\nu > p-1)

    • scale matrix: (\Sigma \in \mathcal{S}_p^{++})

Notation: [ W \sim \mathrm{Wishart}_p(\nu, \Sigma). ]

SciPy uses scipy.stats.wishart(df=nu, scale=Sigma).

Some texts require (\nu\in\mathbb{N}) (because of the Gaussian “sum of outer products” construction), but the PDF extends to real (\nu>p-1).

2) Intuition & Motivation#

2.1 What it models#

The Wishart distribution models a random SPD matrix that behaves like a noisy version of a target matrix (\Sigma). The most important example is the Gaussian scatter / covariance setting:

  • If (x_1,\dots,x_\nu \stackrel{iid}{\sim} \mathcal{N}p(0,\Sigma)), then [ W = \sum{i=1}^{\nu} x_i x_i^\top \sim \mathrm{Wishart}_p(\nu,\Sigma). ]

  • The (unbiased) sample covariance is (S = \frac{1}{\nu}W), so Wishart governs the sampling variability of covariance matrices.

2.2 Typical real-world use cases#

  • Multivariate statistics: sampling distribution of covariance matrices, PCA/FA uncertainty.

  • Hypothesis testing: tests about (\Sigma) (sphericity, equality of covariances, etc.).

  • Bayesian modeling: conjugate prior/posterior for the precision matrix of a multivariate normal.

  • Random matrix theory / simulation: generating random SPD matrices with controlled mean/scale.

2.3 Relations to other distributions#

  • Chi-square: for (p=1), Wishart reduces to a scaled chi-square distribution.

  • Gamma: (\chi^2_\nu) is a Gamma special case, so (p=1) Wishart is also (scaled) Gamma.

  • Inverse-Wishart: distribution of (W^{-1}); commonly used as a prior on covariance matrices.

  • Matrix normal: if (X) is matrix-normal, then (X^\top X) has a Wishart form.

3) Formal Definition#

Let (W\in\mathcal{S}_p^{++}), (\nu>p-1), and (\Sigma\in\mathcal{S}_p^{++}).

3.1 Multivariate gamma function#

The multivariate gamma function is [ \Gamma_p(a) = \pi^{p(p-1)/4}\prod_{j=1}^{p}\Gamma!\left(a + \frac{1-j}{2}\right). ]

SciPy provides the stable log version as scipy.special.multigammaln(a, p).

3.2 PDF#

The Wishart PDF is [ f(W\mid \nu,\Sigma) = \frac{ |W|^{(\nu-p-1)/2},\exp!\left(-\tfrac{1}{2},\mathrm{tr}(\Sigma^{-1}W)\right)} {2^{\nu p/2},|\Sigma|^{\nu/2},\Gamma_p(\nu/2)} \quad\text{for } W\in\mathcal{S}_p^{++}. ]

3.3 CDF (matrix-valued)#

A natural CDF uses the Loewner (PSD) order (\preceq): [ F(X) = \mathbb{P}(W \preceq X) = \int_{0\prec W\preceq X} f(W),dW. ]

  • For (p>1), this CDF generally has no simple closed form; expressions involve special functions of a matrix argument.

  • In practice, you often work with scalar functionals (e.g., diagonal entries, trace, determinant) whose CDFs are easier.

3.4 Scalar special case ((p=1))#

If (p=1), then (W) is a positive scalar and [ W \sim \mathrm{Wishart}1(\nu, \Sigma)\quad\Longleftrightarrow\quad \frac{W}{\Sigma} \sim \chi^2{\nu}. ] So [ F(w) = \mathbb{P}(W\le w) = F_{\chi^2_{\nu}}!\left(\frac{w}{\Sigma}\right). ]

def is_square_matrix(A: np.ndarray) -> bool:
    A = np.asarray(A)
    return A.ndim == 2 and A.shape[0] == A.shape[1]


def is_symmetric(A: np.ndarray, *, atol: float = 1e-10) -> bool:
    A = np.asarray(A)
    return is_square_matrix(A) and np.allclose(A, A.T, atol=atol)


def cholesky_spd(A: np.ndarray) -> np.ndarray:
    '''Return the Cholesky factor L of an SPD matrix A = L L^T (raises if not SPD).'''
    A = np.asarray(A, dtype=float)
    if not is_symmetric(A):
        raise ValueError("Matrix must be symmetric.")
    return np.linalg.cholesky(A)


def validate_scale(scale: np.ndarray) -> np.ndarray:
    scale = np.asarray(scale, dtype=float)
    if not is_square_matrix(scale):
        raise ValueError(f"scale must be a square 2D matrix, got shape={scale.shape}")
    if not is_symmetric(scale):
        raise ValueError("scale must be symmetric")
    _ = cholesky_spd(scale)  # raises if not SPD
    return scale


def validate_df(df: float, p: int) -> float:
    df = float(df)
    if not (df > p - 1):
        raise ValueError(f"df must be > p-1 (p={p}), got df={df}")
    return df


def wishart_logpdf_numpy(W: np.ndarray, df: float, scale: np.ndarray) -> float:
    '''Log-PDF of Wishart_p(df, scale) at a single SPD matrix W.'''
    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    W = np.asarray(W, dtype=float)
    if W.shape != (p, p):
        raise ValueError(f"W must have shape {(p, p)}, got {W.shape}")
    if not is_symmetric(W):
        raise ValueError("W must be symmetric")

    # W must be SPD for the density to be finite.
    _ = cholesky_spd(W)

    sign_w, logdet_w = np.linalg.slogdet(W)
    sign_s, logdet_s = np.linalg.slogdet(scale)
    if sign_w <= 0 or sign_s <= 0:
        return -np.inf

    trace_term = np.trace(np.linalg.solve(scale, W))

    log_norm = (df * p / 2) * np.log(2.0) + (df / 2) * logdet_s + multigammaln(df / 2, p)
    return 0.5 * (df - p - 1) * logdet_w - 0.5 * trace_term - log_norm


def wishart_pdf_numpy(W: np.ndarray, df: float, scale: np.ndarray) -> float:
    return float(np.exp(wishart_logpdf_numpy(W, df, scale)))


def wishart_mean(df: float, scale: np.ndarray) -> np.ndarray:
    scale = validate_scale(scale)
    df = validate_df(df, scale.shape[0])
    return df * scale


def wishart_cov_entry(df: float, scale: np.ndarray, i: int, j: int, k: int, l: int) -> float:
    '''Cov(W_ij, W_kl) for W ~ Wishart_p(df, scale).'''
    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)
    for idx in (i, j, k, l):
        if not (0 <= idx < p):
            raise IndexError("index out of bounds")
    return df * (scale[i, k] * scale[j, l] + scale[i, l] * scale[j, k])


def wishart_var_matrix(df: float, scale: np.ndarray) -> np.ndarray:
    '''Elementwise variances Var(W_ij).'''
    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    out = np.empty((p, p), dtype=float)
    for i in range(p):
        for j in range(p):
            out[i, j] = wishart_cov_entry(df, scale, i, j, i, j)
    return out


def wishart_expected_logdet(df: float, scale: np.ndarray) -> float:
    '''E[log |W|] for W ~ Wishart_p(df, scale).'''
    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    sign_s, logdet_s = np.linalg.slogdet(scale)
    if sign_s <= 0:
        return np.nan

    terms = psi((df + 1 - np.arange(1, p + 1)) / 2)
    return float(logdet_s + p * np.log(2.0) + np.sum(terms))


def wishart_entropy(df: float, scale: np.ndarray) -> float:
    '''Differential entropy H(W) for W ~ Wishart_p(df, scale).'''
    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    _, logdet_s = np.linalg.slogdet(scale)
    digamma_sum = float(np.sum(psi((df + 1 - np.arange(1, p + 1)) / 2)))

    return float(
        (df * p / 2)
        + multigammaln(df / 2, p)
        + ((p + 1) / 2) * logdet_s
        + (p * (p + 1) / 2) * np.log(2.0)
        - ((df - p - 1) / 2) * digamma_sum
    )


def wishart_mgf(T: np.ndarray, df: float, scale: np.ndarray) -> float:
    '''Matrix MGF: M(T)=E[exp(tr(TW))].

    Domain: requires I - 2 * scale^(1/2) * T * scale^(1/2) to be SPD.
    '''
    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    T = np.asarray(T, dtype=float)
    if T.shape != (p, p) or not is_symmetric(T):
        raise ValueError(f"T must be a symmetric {(p,p)} matrix")

    # Use a symmetric similarity transform for numerical stability.
    L = np.linalg.cholesky(scale)
    A = np.eye(p) - 2.0 * (L @ T @ L.T)
    A = 0.5 * (A + A.T)

    _ = cholesky_spd(A)  # domain check
    sign, logdet = np.linalg.slogdet(A)
    if sign <= 0:
        return np.nan
    return float(np.exp(-(df / 2) * logdet))

4) Moments & Properties#

Let (W \sim \mathrm{Wishart}_p(\nu,\Sigma)).

4.1 Mean#

[ \mathbb{E}[W] = \nu,\Sigma. ]

4.2 Covariance structure (entrywise)#

For indices (i,j,k,\ell\in{1,\dots,p}), [ \mathrm{Cov}(W_{ij}, W_{k\ell}) = \nu,\big(\Sigma_{ik}\Sigma_{j\ell} + \Sigma_{i\ell}\Sigma_{jk}\big). ] In particular, [ \mathrm{Var}(W_{ij}) = \nu,(\Sigma_{ii}\Sigma_{jj}+\Sigma_{ij}^2). ]

4.3 Useful scalar marginals#

Even though (W) is matrix-valued, some scalar pieces are simple:

  • Diagonal entries: for every (i), [ \frac{W_{ii}}{\Sigma_{ii}} \sim \chi^2_{\nu}. ] So

    • (\mathbb{E}[W_{ii}] = \nu\Sigma_{ii})

    • (\mathrm{Var}(W_{ii}) = 2\nu\Sigma_{ii}^2)

    • skewness (=\sqrt{8/\nu})

    • excess kurtosis (=12/\nu)

  • Trace (special case): if (\Sigma=I), then [ \mathrm{tr}(W) \sim \chi^2_{\nu p} ] because it is the sum of (\nu p) independent standard-normal squares.

4.4 MGF / characteristic function (matrix argument)#

For a symmetric matrix (T) such that (I-2T\Sigma) is SPD, [ M(T) = \mathbb{E}[\exp(\mathrm{tr}(TW))] = |I-2T\Sigma|^{-\nu/2}. ] The characteristic function is the same with (T\mapsto iT).

4.5 Mode#

If (\nu \ge p+1), the mode is [ W_{\text{mode}} = (\nu-p-1),\Sigma. ]

4.6 Entropy#

A closed form exists in terms of the multivariate gamma and digamma functions. One convenient form is [ \mathbb{E}[\log|W|] = \log|\Sigma| + p\log 2 + \sum_{i=1}^p \psi!\left(\frac{\nu+1-i}{2}\right) ] and [ H(W) = \frac{\nu p}{2} + \log\Gamma_p!\left(\frac{\nu}{2}\right)

  • \frac{p+1}{2}\log|\Sigma| + \frac{p(p+1)}{2}\log 2 -\frac{\nu-p-1}{2}\sum_{i=1}^p \psi!\left(\frac{\nu+1-i}{2}\right). ]

df = 8
scale = np.array([
    [1.5, 0.4],
    [0.4, 1.0],
])

rv = wishart(df=df, scale=scale)
W0 = rv.rvs(random_state=rng)

print("Example W:")
print(np.round(W0, 4))
print()
print("Mean (theory):")
print(wishart_mean(df, scale))
print("Mean (SciPy):")
print(rv.mean())
print()
print("Var elementwise (theory):")
print(np.round(wishart_var_matrix(df, scale), 6))
print("Var elementwise (SciPy):")
print(np.round(rv.var(), 6))
print()
print("logpdf (ours):", wishart_logpdf_numpy(W0, df, scale))
print("logpdf (SciPy):", rv.logpdf(W0))
print("entropy (ours):", wishart_entropy(df, scale))
print("entropy (SciPy):", rv.entropy())
Example W:
[[12.807   3.4193]
 [ 3.4193  4.1853]]

Mean (theory):
[[12.   3.2]
 [ 3.2  8. ]]
Mean (SciPy):
[[12.   3.2]
 [ 3.2  8. ]]

Var elementwise (theory):
[[36.   13.28]
 [13.28 16.  ]]
Var elementwise (SciPy):
[[36.   13.28]
 [13.28 16.  ]]

logpdf (ours): -7.04274089576966
logpdf (SciPy): -7.04274089576966
entropy (ours): 8.185358204431287
entropy (SciPy): 8.185358204431287

5) Parameter Interpretation#

Degrees of freedom (\nu)#

  • Controls the amount of information in the scatter matrix.

  • In the Gaussian construction, (\nu) is literally the number of vectors being summed (a sample size).

  • As (\nu) increases, (W/\nu) concentrates around (\Sigma) with relative fluctuations on the order of (1/\sqrt{\nu}).

Scale matrix (\Sigma)#

  • Sets the mean shape: (\mathbb{E}[W]=\nu\Sigma).

  • Encodes directions and scales via eigen-decomposition (\Sigma = Q\Lambda Q^\top):

    • the eigenvectors (Q) describe preferred directions

    • the eigenvalues (\Lambda) describe how spread those directions are on average

Shape changes (high level)#

  • Increasing (\nu): distribution tightens around (\nu\Sigma) (or around (\Sigma) if you look at (W/\nu)).

  • Increasing overall scale (e.g., (\Sigma\mapsto c\Sigma)): multiplies (W) by (c).

  • Changing correlations in (\Sigma): changes how strongly entries/eigenvalues of (W) move together.

In Section 8 we’ll visualize these effects via scalar summaries (diagonal entries, eigenvalues, log-determinant).

6) Derivations#

These derivations use the Gaussian construction with integer (\nu): (x_1,\dots,x_\nu \stackrel{iid}{\sim} \mathcal{N}_p(0,\Sigma)), (W=\sum_i x_i x_i^\top). The resulting formulas extend to real (\nu>p-1).

6.1 Expectation#

[ \mathbb{E}[W] = \sum_{i=1}^{\nu} \mathbb{E}[x_i x_i^\top] = \nu,\Sigma. ]

6.2 Variance / covariance of entries#

Write (W_{ij}=\sum_{r=1}^{\nu} x_{r,i}x_{r,j}). Using independence across (r) and Isserlis’ theorem for Gaussian fourth moments, [ \mathrm{Cov}(W_{ij}, W_{k\ell}) = \nu,\mathrm{Cov}(x_{1,i}x_{1,j}, x_{1,k}x_{1,\ell}) = \nu,(\Sigma_{ik}\Sigma_{j\ell} + \Sigma_{i\ell}\Sigma_{jk}). ]

6.3 Likelihood (scale matrix (\Sigma))#

Given one observation (W), the log-likelihood (up to constants in (W)) is [ \ell(\Sigma;W) = -\frac{\nu}{2}\log|\Sigma| - \frac{1}{2}\mathrm{tr}(\Sigma^{-1}W) + \text{const}. ] Differentiating w.r.t. (\Sigma^{-1}) yields the MLE [ \widehat{\Sigma}_{\text{MLE}} = \frac{1}{\nu}W. ]

For (m) i.i.d. observations (W_1,\dots,W_m) with common (\nu), [ \widehat{\Sigma}{\text{MLE}} = \frac{1}{m\nu}\sum{t=1}^m W_t. ]

Estimating (\nu) jointly with (\Sigma) requires solving a nonlinear equation (involving digamma functions); we’ll use a simple method-of-moments estimator as a practical alternative.

def wishart_fit_scale_mle(samples: np.ndarray, df: float) -> np.ndarray:
    '''MLE of scale given df for i.i.d. Wishart draws.'''
    samples = np.asarray(samples, dtype=float)
    if samples.ndim != 3 or samples.shape[1] != samples.shape[2]:
        raise ValueError("samples must have shape (n, p, p)")
    p = samples.shape[1]
    df = validate_df(df, p)
    return samples.mean(axis=0) / df


def wishart_fit_df_mom(samples: np.ndarray) -> tuple[float, np.ndarray]:
    '''Method-of-moments df estimate using diagonal entries.

    Uses: Var(W_ii) / E[W_ii]^2 = 2/df  (since W_ii / Sigma_ii ~ chi^2_df).
    '''
    samples = np.asarray(samples, dtype=float)
    if samples.ndim != 3 or samples.shape[1] != samples.shape[2]:
        raise ValueError("samples must have shape (n, p, p)")

    diag = np.diagonal(samples, axis1=1, axis2=2)  # (n, p)
    m = diag.mean(axis=0)
    v = diag.var(axis=0, ddof=0)

    if np.any(m <= 0):
        raise ValueError("Diagonal means must be positive for MOM df estimator")

    df_hats = 2.0 / (v / (m**2))
    df_hat = float(np.median(df_hats))
    return df_hat, df_hats


def wishart_fit_df_scale_mom(samples: np.ndarray) -> dict:
    df_hat, df_hats = wishart_fit_df_mom(samples)
    scale_hat = wishart_fit_scale_mle(samples, df_hat)
    return {"df": df_hat, "df_by_diag": df_hats, "scale": scale_hat}


# Demo: recover df and scale from synthetic samples
p = 3
scale_true = np.array([
    [1.0, 0.4, 0.2],
    [0.4, 1.7, 0.1],
    [0.2, 0.1, 0.9],
])
df_true = 12

rv_true = wishart(df=df_true, scale=scale_true)
samples = rv_true.rvs(size=8_000 if FAST_RUN else 40_000, random_state=rng)

fit = wishart_fit_df_scale_mom(samples)

print("df true:", df_true)
print("df hat :", round(fit["df"], 3))
print("df hats by diagonal:", np.round(fit["df_by_diag"], 3))
print()
print("scale true:")
print(np.round(scale_true, 3))
print("scale hat :")
print(np.round(fit["scale"], 3))
df true: 12
df hat : 12.132
df hats by diagonal: [12.132 11.919 12.216]

scale true:
[[1.  0.4 0.2]
 [0.4 1.7 0.1]
 [0.2 0.1 0.9]]
scale hat :
[[0.984 0.396 0.194]
 [0.396 1.688 0.094]
 [0.194 0.094 0.889]]

7) Sampling & Simulation (NumPy-only)#

The most common efficient sampler is the Bartlett decomposition.

7.1 Bartlett decomposition (idea)#

For (W \sim \mathrm{Wishart}_p(\nu, I)):

  1. Build a lower-triangular matrix (A) such that

    • diagonal: (A_{ii}^2 \sim \chi^2_{\nu-i+1})

    • sub-diagonal: (A_{ij}\sim \mathcal{N}(0,1)) for (i>j)

    • above diagonal: (0)

  2. Then [ W = AA^\top. ]

For general scale (\Sigma): if (\Sigma = LL^\top) (Cholesky), then [ W = LAA^\top L^\top. ]

This works for any real (\nu>p-1) because chi-square is defined for non-integer degrees of freedom.

7.2 Direct Gaussian construction (also NumPy-only)#

If (\nu\in\mathbb{N}), you can also sample (x_r\sim \mathcal{N}_p(0,\Sigma)) and return (\sum_r x_r x_r^\top). It is conceptually simple but can be slower for large (\nu).

def wishart_rvs_bartlett(df: float, scale: np.ndarray, *, size: int = 1, rng=None) -> np.ndarray:
    '''NumPy-only sampling via Bartlett decomposition.

    Returns shape (p, p) if size=1, else (size, p, p).
    '''
    if rng is None:
        rng = np.random.default_rng()

    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    L = np.linalg.cholesky(scale)

    out = np.empty((size, p, p), dtype=float)
    for s in range(size):
        A = np.zeros((p, p), dtype=float)
        for i in range(p):
            A[i, i] = np.sqrt(rng.chisquare(df - i))
            if i > 0:
                A[i, :i] = rng.standard_normal(i)

        LA = L @ A
        W = LA @ LA.T
        out[s] = W

    return out[0] if size == 1 else out


def wishart_rvs_normals(df: float, scale: np.ndarray, *, size: int = 1, rng=None) -> np.ndarray:
    '''NumPy-only sampling via the Gaussian construction (requires integer df).'''
    if rng is None:
        rng = np.random.default_rng()

    df_int = int(df)
    if df_int != df:
        raise ValueError("Gaussian construction requires integer df")

    scale = validate_scale(scale)
    p = scale.shape[0]
    df = validate_df(df, p)

    L = np.linalg.cholesky(scale)

    Z = rng.standard_normal(size=(size, df_int, p))
    X = Z @ L.T  # each row ~ N(0, scale)

    # W = X^T X for each sample
    W = np.einsum("sni,snj->sij", X, X)
    return W[0] if size == 1 else W


# Quick sanity check: sampler returns SPD
scale_demo = np.array([
    [1.5, 0.4],
    [0.4, 1.0],
])
W_demo = wishart_rvs_bartlett(8, scale_demo, size=1, rng=rng)
_ = np.linalg.cholesky(W_demo)  # should not raise
W_demo
array([[ 2.356 , -0.1647],
       [-0.1647,  5.092 ]])

8) Visualization#

Because Wishart is matrix-valued, direct visualization of its full PDF/CDF is difficult for (p>2). A practical approach is to visualize:

  • Scalar marginals that have known forms (e.g., diagonal entries)

  • Scalar summaries (trace, determinant, eigenvalues)

Below we show:

  1. A single Monte Carlo draw (W) (as a heatmap)

  2. PDF/CDF of a diagonal entry (W_{11}) (a scaled chi-square)

  3. Scatter of ((W_{11}, W_{22})) to show dependence

  4. How (\nu) changes concentration of (W/\nu)

df = 8
scale = np.array([
    [1.5, 0.4],
    [0.4, 1.0],
])

Ws = wishart_rvs_bartlett(df, scale, size=N_SAMPLES, rng=rng)

W_one = Ws[0]
fig = px.imshow(W_one, text_auto=".3f", title="One Wishart draw W (heatmap)")
fig.update_layout(coloraxis_showscale=True)
fig.show()

w11 = Ws[:, 0, 0]
w22 = Ws[:, 1, 1]

# Skewness/kurtosis of a diagonal element (should match chi-square)
skew_theory = np.sqrt(8.0 / df)
excess_kurt_theory = 12.0 / df

skew_mc = stats.skew(w11, bias=False)
excess_kurt_mc = stats.kurtosis(w11, fisher=True, bias=False)

print("W_11 skewness theory:", round(skew_theory, 4), "MC:", round(float(skew_mc), 4))
print("W_11 excess kurtosis theory:", round(excess_kurt_theory, 4), "MC:", round(float(excess_kurt_mc), 4))

# --- PDF of W_11 ---
x = np.linspace(np.percentile(w11, 0.5), np.percentile(w11, 99.5), 400)
pdf_theory = stats.chi2.pdf(x / scale[0, 0], df=df) / scale[0, 0]

fig = go.Figure()
fig.add_trace(go.Histogram(x=w11, nbinsx=60, histnorm="probability density", name="MC (W_11)"))
fig.add_trace(go.Scatter(x=x, y=pdf_theory, mode="lines", name="Theory (scaled chi-square)"))
fig.update_layout(
    title="Wishart diagonal marginal: PDF of W_11",
    xaxis_title="w",
    yaxis_title="density",
)
fig.show()

# --- CDF of W_11 ---
xs = np.sort(w11)
ecdf = np.arange(1, xs.size + 1) / xs.size
cdf_theory = stats.chi2.cdf(xs / scale[0, 0], df=df)

fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ecdf, mode="lines", name="Empirical CDF"))
fig.add_trace(go.Scatter(x=xs, y=cdf_theory, mode="lines", name="Theory CDF"))
fig.update_layout(
    title="Wishart diagonal marginal: CDF of W_11",
    xaxis_title="w",
    yaxis_title="F(w)",
)
fig.show()

# --- Dependence between diagonal entries ---
fig = px.scatter(
    x=w11,
    y=w22,
    opacity=0.25,
    title="Monte Carlo samples: (W_11, W_22)",
    labels={"x": "W_11", "y": "W_22"},
)
fig.show()
W_11 skewness theory: 1.0 MC: 0.9817
W_11 excess kurtosis theory: 1.5 MC: 1.4707
# Effect of df on concentration of W/df around scale

scale3 = np.array([
    [1.0, 0.6, 0.2],
    [0.6, 1.8, 0.3],
    [0.2, 0.3, 0.7],
])
p = scale3.shape[0]

dfs = [p + 0.5, 10, 40]
ns = 5_000 if FAST_RUN else 25_000

rows = []
for df0 in dfs:
    Ws0 = wishart_rvs_bartlett(df0, scale3, size=ns, rng=rng)
    # Frobenius distance of W/df to the target scale
    dist = np.linalg.norm(Ws0 / df0 - scale3, axis=(1, 2))
    rows.append({"df": np.full(ns, df0), "dist": dist})

df_col = np.concatenate([r["df"] for r in rows])
dist_col = np.concatenate([r["dist"] for r in rows])

fig = px.histogram(
    x=dist_col,
    color=df_col.astype(str),
    nbins=70,
    barmode="overlay",
    opacity=0.55,
    title="Concentration increases with df: ||W/df - scale||_F",
    labels={"x": "Frobenius distance", "color": "df"},
)
fig.show()

9) SciPy Integration (scipy.stats.wishart)#

SciPy provides a frozen Wishart distribution via:

rv = scipy.stats.wishart(df=nu, scale=Sigma)

Available methods (SciPy 1.15):

  • pdf, logpdf

  • rvs

  • mean, var, mode, entropy

Notably missing (as of SciPy 1.15):

  • cdf (matrix CDF is nontrivial)

  • fit (no built-in MLE)

Workarounds:

  • For CDF-like quantities, use scalar marginals (e.g., diagonal entries) or Monte Carlo estimates of (\mathbb{P}(W\preceq X)).

  • For fitting, use the closed-form MLE for (\Sigma) given (\nu), or a method-of-moments / numerical MLE for (\nu).

df = 8
scale = np.array([
    [1.5, 0.4],
    [0.4, 1.0],
])

rv = wishart(df=df, scale=scale)
W = rv.rvs(random_state=rng)

print("W:")
print(np.round(W, 4))
print("pdf:", rv.pdf(W))
print("logpdf:", rv.logpdf(W))
print("mean:")
print(rv.mean())
print("var (elementwise):")
print(np.round(rv.var(), 6))
print("mode:")
print(np.round(rv.mode(), 4))
print("entropy:", rv.entropy())
print()
print("has cdf?", hasattr(rv, "cdf"))
print("has fit?", hasattr(rv, "fit"))

# A "CDF" we *can* compute: diagonal marginal (scaled chi-square)
w = float(W[0, 0])
print()
print("P(W_11 <= w) via chi-square:", stats.chi2.cdf(w / scale[0, 0], df=df))
W:
[[14.1372  1.9336]
 [ 1.9336  1.8141]]
pdf: 0.0002542559011445013
logpdf: -8.277169313297081
mean:
[[12.   3.2]
 [ 3.2  8. ]]
var (elementwise):
[[36.   13.28]
 [13.28 16.  ]]
mode:
[[7.5 2. ]
 [2.  5. ]]
entropy: 8.185358204431287

has cdf? False
has fit? False

P(W_11 <= w) via chi-square: 0.6922652617053153
# Monte Carlo estimate of the matrix CDF: P(W \preceq X)

scale = np.array([
    [1.2, 0.3],
    [0.3, 0.9],
])
df = 9

# Pick a threshold X (SPD). For example, a multiple of the mean.
X = 1.15 * wishart_mean(df, scale)

Ws = wishart_rvs_bartlett(df, scale, size=30_000 if FAST_RUN else 150_000, rng=rng)

# W \preceq X  <=>  X - W is PSD. For 2x2, PSD can be checked by eigenvalues.
# (For larger p you would use a Cholesky attempt on X-W, but numerical noise matters.)

mats = X[None, :, :] - Ws

# numerical tolerance: allow tiny negative eigenvalues from floating-point error
lmin = np.linalg.eigvalsh(mats)[:, 0]
prob_hat = np.mean(lmin >= -1e-10)

print("X:")
print(np.round(X, 3))
print(r"Monte Carlo P(W \preceq X) ≈", prob_hat)
X:
[[12.42   3.105]
 [ 3.105  9.315]]
Monte Carlo P(W \preceq X) ≈ 0.3414

10) Statistical Use Cases#

10.1 Hypothesis testing#

If (x_1,\dots,x_\nu\sim\mathcal{N}_p(0,\Sigma)), then the scatter matrix (W=\sum_i x_i x_i^\top) is Wishart. This gives exact sampling distributions for statistics built from (W).

A simple example uses whitening under a null (\Sigma_0): [ \Sigma_0^{-1/2} W, \Sigma_0^{-1/2} \sim \mathrm{Wishart}p(\nu, I) ] when (\Sigma=\Sigma_0). Then (\mathrm{tr}(\Sigma_0^{-1} W)\sim\chi^2{\nu p}) provides an exact test of total variance.

10.2 Bayesian modeling#

Wishart is conjugate to the precision matrix (\Lambda=\Sigma^{-1}) in a multivariate normal model. With known mean (say 0):

  • Prior: (\Lambda \sim \mathrm{Wishart}_p(\nu_0, S_0))

  • Data: (x_i\mid\Lambda \sim \mathcal{N}_p(0,\Lambda^{-1}))

  • Let (S=\sum_i x_i x_i^\top). Posterior: [ \Lambda\mid x_{1:n} \sim \mathrm{Wishart}_p\big(\nu_0+n,; (S_0^{-1}+S)^{-1}\big). ]

10.3 Generative modeling#

Wishart gives a principled way to generate random SPD matrices that can be used as:

  • random covariance / precision matrices

  • random kernels for Gaussian processes

  • synthetic correlation structures for simulation studies

# 10.1 Hypothesis test example via whitening + trace statistic

p = 3
nu = 15

Sigma0 = np.array([
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
])

# Create data under an alternative (not equal to Sigma0)
Sigma_true = np.array([
    [1.0, 0.6, 0.2],
    [0.6, 1.5, 0.1],
    [0.2, 0.1, 0.7],
])

X = rng.multivariate_normal(mean=np.zeros(p), cov=Sigma_true, size=nu)
W = X.T @ X

# Under H0: Sigma=Sigma0, the whitened scatter is Wishart(nu, I)
# and tr(Sigma0^{-1} W) = tr(W) ~ chi2_{nu*p}.

T = float(np.trace(np.linalg.solve(Sigma0, W)))
p_value = 1.0 - stats.chi2.cdf(T, df=nu * p)

print("Trace test statistic T=tr(Sigma0^{-1} W):", round(T, 4))
print("p-value (right tail):", p_value)

# Note: this test is sensitive to overall variance (trace), not all aspects of covariance.
Trace test statistic T=tr(Sigma0^{-1} W): 30.0502
p-value (right tail): 0.9574804185486624
# 10.2 Bayesian update: Wishart prior on precision

p = 2
n = 30

Sigma_true = np.array([
    [1.0, 0.7],
    [0.7, 1.8],
])

# Data model: x ~ N(0, Sigma_true)
X = rng.multivariate_normal(mean=np.zeros(p), cov=Sigma_true, size=n)
S = X.T @ X

# Prior on precision Lambda = Sigma^{-1}
nu0 = p + 2  # must be > p-1
S0 = np.eye(p)  # prior scale (on precision)

nu_post = nu0 + n
S_post = np.linalg.inv(np.linalg.inv(S0) + S)

post = wishart(df=nu_post, scale=S_post)
Lambda_samples = post.rvs(size=3_000 if FAST_RUN else 15_000, random_state=rng)

# Posterior mean of precision: E[Lambda] = nu_post * S_post
Lambda_mean = nu_post * S_post

# Convert to covariance by inversion (Monte Carlo)
Sigma_samples = np.linalg.inv(Lambda_samples)
Sigma_mean_mc = Sigma_samples.mean(axis=0)

print("Posterior df:", nu_post)
print("Posterior scale (precision):")
print(np.round(S_post, 4))
print()
print("E[Lambda|data] (closed form):")
print(np.round(Lambda_mean, 4))
print("E[Sigma|data] (MC via inversion):")
print(np.round(Sigma_mean_mc, 4))
Posterior df: 34
Posterior scale (precision):
[[ 0.0314 -0.0074]
 [-0.0074  0.0235]]

E[Lambda|data] (closed form):
[[ 1.0692 -0.2528]
 [-0.2528  0.7982]]
E[Sigma|data] (MC via inversion):
[[1.1047 0.3462]
 [0.3462 1.4882]]
# 10.3 Generative modeling: random covariance matrices

p = 2
nu = 25

Sigma_target = np.array([
    [1.0, 0.5],
    [0.5, 1.3],
])

# Sample a random scatter matrix with mean nu * Sigma_target
W = wishart_rvs_bartlett(nu, Sigma_target, size=1, rng=rng)

# Turn it into a random covariance estimate with mean Sigma_target
Sigma_sample = W / nu

Y = rng.multivariate_normal(mean=np.zeros(p), cov=Sigma_sample, size=300)

print("Sigma target:")
print(np.round(Sigma_target, 3))
print("Sigma sampled:")
print(np.round(Sigma_sample, 3))

fig = px.scatter(
    x=Y[:, 0],
    y=Y[:, 1],
    opacity=0.6,
    title="Data generated from a random covariance (Sigma_sample)",
    labels={"x": "y1", "y": "y2"},
)
fig.show()
Sigma target:
[[1.  0.5]
 [0.5 1.3]]
Sigma sampled:
[[0.982 0.762]
 [0.762 1.08 ]]

11) Pitfalls#

  • Parameter constraints:

    • (\nu > p-1) is required for a proper density.

    • (\Sigma) must be symmetric positive definite (not just semidefinite).

  • SPD checks are numerical: floating-point symmetry / PSD checks need tolerances.

  • Determinants and inverses can overflow/underflow:

    • prefer logpdf over pdf

    • use slogdet and solves (np.linalg.solve) instead of explicit inverses

  • CDF is hard: for (p>1), the matrix CDF is not available in SciPy; use scalar functionals or Monte Carlo.

  • Inverse-Wishart is often what you want: many Bayesian covariance models place a prior on (\Sigma), not on (\Sigma^{-1}).

  • Interpretation: (\nu) scales the mean; looking at (W/\nu) is often more interpretable as a noisy estimate of (\Sigma).

12) Summary#

  • Wishart is a continuous distribution on SPD matrices, parameterized by degrees of freedom (\nu) and scale (\Sigma).

  • It is the sampling distribution of Gaussian scatter matrices: (W=\sum_i x_i x_i^\top).

  • Key formulas:

    • (\mathbb{E}[W]=\nu\Sigma)

    • (\mathrm{Cov}(W_{ij},W_{k\ell})=\nu(\Sigma_{ik}\Sigma_{j\ell}+\Sigma_{i\ell}\Sigma_{jk}))

    • diagonal entries are scaled chi-square

    • matrix MGF: (|I-2T\Sigma|^{-\nu/2})

  • For simulation, Bartlett decomposition provides an efficient NumPy-only sampler.

  • In SciPy, scipy.stats.wishart supports pdf/logpdf, rvs, mean/var/mode/entropy, but not cdf or fit.